Data specifics

  • The Longitudinal Employer Household Dynamics (LEHD) program at the US Census Bureau releases the Origin Destination Employment Statistis (LODES) datasets annually based on employer-employee insurance records.
  • This datafile uses data from the Origin-Destination (OD) data files from LEHD. The OD datafile lists each pair the census blocks for where workers live and work, enabling us to calculate the average commute distance by calculating the distance between each home and workplace census block pairing.
  • Distance calculations: All distances are "as the crow flies" and calulated using the Vincenty Ellipsoid method based on the latitude and longitudes of the centroids of each census block. These distances are then aggregated to the census block group and tract level. Distances are likely to be an underestimate of actual road travel.
  • Data presented here are from 2018 and spatial units are based on the 2010 census. As of July of 2021, 2018 is the most recent year for which data are available. The earliest year for which data are available is 2002.
  • There are two datafiles presented here:
    • Charlottesville region residents: The first contains average and median commute distances for each SU calculated based on the following groups: (1) People who live and work in the Charlottesville region; (2) Charlottesville area residents who work within 40 miles of their home census block; (3) Charlottesville area residents who work in a tract outside the Charlottesville area that employs at least 25 Charlottsville area residents; (4) All Charlottesville area residents represented in the LODES OD data. The data also contains the number of residents within each SU who fall into those 4 categories.
    • Charlottesville region workers: The second contains average and median commute distnaces for each SU calculated based on the following groups:(1) Charlottesville area workers who live within 40 miles of their work-place census block; (2) Charlottesville area workers who live in a tract that (a) is outside the Charlottesville area and (b) is home to 25 or more Cville workers; and (3) All Charlottesville area workers represented in the data. This data file also contains the percent of workers in each SU that live outside the Charlottesville region.
  • Some limitations:
    • The data are prone to imperfect geocoding for certain jobs; jobs for companies with multiple branches are often all coded in the same location. This means that distance calculations are likely to be an overestimate if many residents within one SU are employed by a company with multiple branches or a company whose headquarters is far away. There is also no way to differentiate between remote workers or the frequency with which any worker actually travels to their place of emplyoment (note: these data were collected prior to the COVID-19 pandemic when fewer people were working remotely). For these reasons, we include calculations of average and median commute distances based on multiple groups of workers. The estimates based on all residents in an SU are most likely to be an overestimate, while those based on residents working within 40 miles of home are likely to be the most conservative.
    • The distances are "as the crow flies" and therefore imprecise estimates of actual commute distances on the road.
    • These data do not include workers in defense-related industries.
    • Student-workers are unlikely to be represented in these data because their jobs are not typically covered by state unemployment insurance.

Data preview

Charlottesville Area residents

glimpse(lodesresidents)
## Rows: 50
## Columns: 14
## $ tract               <dbl> 51003010100, 51003010201, 51003010202, 51003010300…
## $ medc_alltr          <dbl> 27.97247, 20.30768, 18.34769, 19.20226, 24.75778, …
## $ medc_workinRegiontr <dbl> 7.568706, 4.736960, 2.938385, 4.120100, 9.673803, …
## $ medc_within40tr     <dbl> 9.373904, 6.713784, 4.345703, 5.958594, 11.009505,…
## $ medc_25_employeestr <dbl> 20.59655, 13.97487, 13.88836, 13.94813, 19.50496, …
## $ commutersIntr       <int> 2124, 2387, 1477, 3875, 1890, 1397, 2436, 2625, 17…
## $ commuterinRegiontr  <int> 1460, 1795, 1133, 2937, 1320, 1020, 1897, 1905, 13…
## $ commuterw40tr       <int> 1629, 1967, 1208, 3202, 1491, 1116, 2057, 2103, 15…
## $ commuter25tr        <int> 1948, 2240, 1389, 3646, 1749, 1304, 2312, 2446, 16…
## $ avgc_alltr          <dbl> 29.49535, 20.02431, 19.23946, 19.34351, 25.56457, …
## $ avgc_workinRegiontr <dbl> 8.138391, 4.976508, 3.208846, 4.316983, 10.119677,…
## $ avgc_within40tr     <dbl> 9.722919, 6.784946, 4.543631, 6.040549, 11.349601,…
## $ avgc_25_employeestr <dbl> 22.85983, 15.48127, 15.25275, 14.75717, 20.70080, …
## $ county              <int> 51003, 51003, 51003, 51003, 51003, 51003, 51003, 5…

Charlottesville Area workers

glimpse(lodesworkers)
## Rows: 50
## Columns: 15
## $ tract                            <dbl> 51003010100, 51003010201, 51003010202…
## $ medc_allworkerstr                <dbl> 13.47568, 45.27164, 22.76243, 31.3502…
## $ medc_livewithin40tr              <dbl> 10.161318, 9.956167, 10.347764, 9.727…
## $ medc_25_restr                    <dbl> 13.17269, 18.92537, 16.69125, 15.5441…
## $ workersIntr                      <int> 520, 759, 331, 3604, 1040, 1242, 7251…
## $ liveoutsideCvilletr              <int> 92, 384, 120, 1535, 484, 697, 3257, 1…
## $ liveinsideCvilletr               <int> 428, 375, 211, 2069, 556, 545, 3994, …
## $ perc_workers_liveinCvilletr      <dbl> 82.30769, 49.40711, 63.74622, 57.4084…
## $ perc_workers_liveoutsideCvilletr <dbl> 17.69231, 50.59289, 36.25378, 42.5915…
## $ workersw40tr                     <int> 487, 455, 272, 2633, 702, 676, 5391, …
## $ workers25tr                      <int> 509, 548, 305, 2966, 826, 930, 6086, …
## $ avgc_allworkerstr                <dbl> 14.14282, 39.35919, 23.12581, 31.2667…
## $ avgc_livewithin40tr              <dbl> 10.537103, 9.063723, 10.241915, 10.25…
## $ avgc_25_restr                    <dbl> 12.61945, 18.82021, 16.24600, 16.4914…
## $ county                           <int> 51003, 51003, 51003, 51003, 51003, 51…

Variable descriptions

Charlottesville Area residents

metaresidents %>% 
  filter(su_tract == 1) %>%
  select(varname, about) %>% as.list()
## $varname
##  [1] "tract"               "county"              "avgc_alltr"         
##  [4] "medc_alltr"          "commutersIntr"       "avgc_within40tr"    
##  [7] "medc_within40tr"     "commuterw40tr"       "avgc_25_employeestr"
## [10] "medc_25_employeestr" "commuter25tr"        "avgc_workinRegiontr"
## [13] "medc_workinRegiontr" "commuterinRegiontr" 
## 
## $about
##  [1] "11-digit census tract code"                                                                                                                                                     
##  [2] "5-digit county code"                                                                                                                                                            
##  [3] "Average \"as the crow flies\" commuting distance for all residents in the census tract"                                                                                         
##  [4] "Median \"as the crow flies\" commuting distance for all residents in the census tract"                                                                                          
##  [5] "The number of residents in each census tract who are represented in the data"                                                                                                   
##  [6] "Average \"as the crow flies\" commuting distance for residents of the census tract who work within 40 miles"                                                                    
##  [7] "Median \"as the crow flies\" commuting distance for residents of the census tract who work within 40 miles"                                                                     
##  [8] "The number of residents in the census tract who work within 40 miles of home"                                                                                                   
##  [9] "Average \"as the crow flies\" commuting distance for residents of the census tract who commute to a census tract that employs at least 25 residents from the region of interest"
## [10] "Median \"as the crow flies\" commuting distance for residents of the census tract who commute to a census tract that employs at least 25 residents of the region of interest"   
## [11] "The number of residents in the census tract who commute to a census tract that employs at least 25 residents of the region of interest"                                         
## [12] "Average \"as the crow flies\" commuting distance for residents of the census tract who work in the same region as where they live"                                              
## [13] "Median \"as the crow flies\" commuting distance for residents of the census tract who work in the same region as where they live"                                               
## [14] "The number of residents living in the census tract who commute to work within the region of interest"

Charlottesville Area workers

metaworkers %>% 
  filter(su_tract == 1) %>%
  select(varname, about) %>% as.list()
## $varname
##  [1] "county"                           "tract"                           
##  [3] "medc_allworkerstr"                "medc_livewithin40tr"             
##  [5] "medc_25_restr"                    "workersIntr"                     
##  [7] "liveoutsideCvilletr"              "liveinsideCvilletr"              
##  [9] "perc_workers_liveiinCvilletr"     "perc_workers_liveoutsideCvilletr"
## [11] "workersw40blktr"                  "workers25blktr"                  
## [13] "avgc_allworkerstr"                "avgc_livewithin40tr"             
## [15] "avgc_25_restr"                   
## 
## $about
##  [1] "5-digit county code"                                                                                                                                          
##  [2] "11-digit census tract code where the workers are employed (workplace census tract)"                                                                           
##  [3] "Median \"as the crow flies\" commuting distance for all workers employed in the census tract"                                                                 
##  [4] "Median \"as the crow flies\" commuting distance for workers employed in the census tract who live within 40 miles of work"                                    
##  [5] "Median \"as the crow flies\" commute for workers employed in the census tract who live in census tract where at least 25 Charlottesville region workers live" 
##  [6] "The total number of workers employed in the census tract"                                                                                                     
##  [7] "The number of workers employed in the census tract that live outside the Charlottesville region"                                                              
##  [8] "The number of workers employed in the census block group who live outside the Charlottesville region"                                                         
##  [9] "The percent of workers employed in the census tract who live inside the Charlottesville region"                                                               
## [10] "The percent of workers employed in the census tract who live outside the Charlottesville region"                                                              
## [11] "The number of workers employed in the census tract who live within 40 miles of work"                                                                          
## [12] "The number of workers employed in the census tract who live in a census tract where at least 25 Charlottesville region workers live"                          
## [13] "Average \"as the crow flies\" commuting distance for all workers employed in the census tract"                                                                
## [14] "Average \"as the crow flies\" commuting distance for workers employed in the census tract who live within 40 miles of work"                                   
## [15] "Average \"as the crow flies\" commute for workers employed in the census tract who live in census tract where at least 25 Charlottesville region workers live"

Variable descriptives

Charlottesville Area residents

lodesresidents %>% select(avgc_alltr, avgc_within40tr, avgc_25_employeestr, avgc_workinRegiontr, medc_alltr, medc_within40tr, medc_25_employeestr, medc_workinRegiontr) %>% 
  select(where(~is.numeric(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 2,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## =====================================================
## Statistic           Mean  St. Dev.  Min  Median  Max 
## -----------------------------------------------------
## avgc_alltr          25.85   9.82   15.36 24.12  76.12
## avgc_within40tr     9.00    5.09   3.36   7.52  20.58
## avgc_25_employeestr 20.26   8.58   10.53 18.50  64.77
## avgc_workinRegiontr 7.39    5.77   1.52   5.04  19.82
## medc_alltr          25.60  10.18   14.15 24.06  78.38
## medc_within40tr     8.88    5.06   3.21   6.94  20.32
## medc_25_employeestr 19.81   8.72   9.78  18.35  64.84
## medc_workinRegiontr 7.15    5.64   1.37   4.76  18.63
## -----------------------------------------------------

Charlottesville Area workers

lodesworkers %>% select(avgc_allworkerstr, avgc_livewithin40tr, avgc_25_restr, medc_allworkerstr, medc_livewithin40tr, medc_25_restr, perc_workers_liveoutsideCvilletr) %>% 
  select(where(~is.numeric(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 2,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## ==================================================================
## Statistic                        Mean  St. Dev.  Min  Median  Max 
## ------------------------------------------------------------------
## avgc_allworkerstr                26.20   7.61   12.01 24.83  43.34
## avgc_livewithin40tr              10.59   1.92   7.55  10.61  16.80
## avgc_25_restr                    16.12   3.62   9.66  15.13  27.01
## medc_allworkerstr                25.81   9.59   10.46 23.66  50.84
## medc_livewithin40tr              10.48   2.16   6.82  10.48  18.37
## medc_25_restr                    15.79   4.23   8.29  15.04  29.68
## perc_workers_liveoutsideCvilletr 37.60  10.69   17.69 37.05  73.43
## ------------------------------------------------------------------

Visual distribution

Charlottesville Area residents

longr <- lodesresidents %>% select(c(tract, avgc_alltr, avgc_within40tr, avgc_25_employeestr, avgc_workinRegiontr, medc_alltr, medc_within40tr, medc_25_employeestr, medc_workinRegiontr)) %>% 
  pivot_longer(-tract, names_to = "measure", values_to = "value")
longr$measure <- factor(longr$measure,
                         levels = c("avgc_alltr", "medc_alltr", "avgc_within40tr", "medc_within40tr", "avgc_25_employeestr", "medc_25_employeestr", "avgc_workinRegiontr",
                                    "medc_workinRegiontr"))
longr %>% 
  ggplot(aes(x = value, fill = measure)) +
  scale_fill_viridis(option = "plasma", discrete = TRUE, guide = FALSE) +
  geom_histogram() + 
  facet_wrap(~measure, scales = "free", ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Charlottesville Area workers

longw <- lodesworkers %>% select(c(tract, avgc_allworkerstr, avgc_livewithin40tr, avgc_25_restr, medc_allworkerstr, medc_livewithin40tr, medc_25_restr)) %>% 
  pivot_longer(-tract, names_to = "measure", values_to = "value")
longw$measure <- factor(longw$measure,
                         levels = c("avgc_allworkerstr", "medc_allworkerstr", "avgc_livewithin40tr", "medc_livewithin40tr", "avgc_25_restr", "medc_25_restr"))
longw %>% 
  ggplot(aes(x = value, fill = measure)) +
  scale_fill_viridis(option = "plasma", discrete = TRUE, guide = FALSE) +
  geom_histogram() + 
  facet_wrap(~measure, scales = "free", ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Mapping the data

Full data

All Cville area residents

pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$avgc_alltr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(avgc_alltr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Average commute (mi): ", round(cvl_lodesfull$avgc_alltr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$avgc_alltr, 
            title = "Average commute (mi)", opacity = 0.7)

All Cville area workers

pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$avgc_allworkerstr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(avgc_allworkerstr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Average commute (mi): ", round(cvl_lodesfull$avgc_allworkerstr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$avgc_allworkerstr, 
            title = "Average commute (mi)", opacity = 0.7)

Within 40 miles

Cville area residents who work within 40 miles of home

pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$avgc_within40tr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(avgc_within40tr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Average commute (mi): ", round(cvl_lodesfull$avgc_within40tr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$avgc_within40tr, 
            title = "Average commute (mi)", opacity = 0.7)

Cville area workers who live within 40 miles of work

pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$avgc_livewithin40tr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(avgc_livewithin40tr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Average commute (mi): ", round(cvl_lodesfull$avgc_livewithin40tr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$avgc_livewithin40tr, 
            title = "Average commute (mi)", opacity = 0.7)

Cville area residents and workers only

  • Average commute distances for people who work and live in the Charlottesville region
pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$avgc_workinRegiontr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(avgc_workinRegiontr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Average commute (mi): ", round(cvl_lodesfull$avgc_workinRegiontr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$avgc_workinRegiontr, 
            title = "Average commute (mi)", opacity = 0.7)

Percent of workers who live outside the Charlottesville region

pal <- colorNumeric("plasma", reverse = T, domain = cvl_lodesfull$perc_workers_liveoutsideCvilletr)
leaflet(cvl_lodesfull) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = cvl_lodesfull,
              fillColor = ~pal(perc_workers_liveoutsideCvilletr),
              weight = 1,
              opacity = 1,
              color = "white", 
              fillOpacity = 0.6,
              highlight = highlightOptions(
                weight = 1, fillOpacity = 0.8, bringToFront = T
              ),
              popup = paste0("GEOID: ", cvl_lodesfull$geocode, "<br>",
                             "Percent of workers: ", round(cvl_lodesfull$perc_workers_liveoutsideCvilletr, 2))) %>% 
  addLegend("bottomright", pal = pal, values = cvl_lodesfull$perc_workers_liveoutsideCvilletr, 
            title = "Percent of workers <br> who live outside Cville <br> region", opacity = 0.7)